Purpose of Study:

The present study will examine young adults from the The National Longitudinal Study of Adolescent to Adult Health (AddHealth). The goals of the analysis will include 1) establishing the relationship between being in an abusive relationship and individual’s happiness in that abusive relationship; 2) determining whether or not the different forms of abuse would make different impacts to the well-being state of individuals in that abusive relationship; 3) determine how the association/relationship between being in an abusive relationship and individual’s well-being in that abusive relationship may shape differently taking in consideration of ethnicity, age, and sexual orientation.

Variables:

Variables from AddHealth that will be used include: H4RD2Y (The total amount of time that the individual were involved in a romantic or sexual relationship with their partner), H4RD5 (The amount of nights on an average week when you and your partner spent the night together), H4RD7B (Satisfaction level with the way individuals handle their problems and disagreements), H4RD9 (The level of happiness of individuals in the relationship), H4RD18 (The frequency of physical abuse threats and attempts), H4RD20 (The frequency of physical injures), H4RD21 (The frequency of non-consensual sexual activities“), H4TR1(The number of people you have married). 

Data Management

First, the data is placed on the search path using the PDS package. The variables of interest are selected and stored in the data frame NDF using the select function from the dplyr package. Then, those variables of interest are renamed and given more reasonable names instead of the original unique identifiers from the codebook.

library(PDS)
NDF <- addhealth_public4 %>%
  rename(TimeSpent = h4rd2y, NightsSpent = h4rd5, SatisfactionWConflict = h4rd7b, HappinessLevel = h4rd9, PhysicalAbuseAttempts = h4rd18, PhysicalInjure = h4rd20, SexualAssaults = h4rd21, PeopleMarried = h4tr1, CigsSmoke = h4to6, SmokeFreq = h4to5 ) %>%
  select(TimeSpent, NightsSpent, SatisfactionWConflict, HappinessLevel, PhysicalAbuseAttempts, PhysicalInjure, SexualAssaults, PeopleMarried, CigsSmoke, SmokeFreq)
summary(NDF)
   TimeSpent       NightsSpent    SatisfactionWConflict HappinessLevel 
 Min.   : 0.000   Min.   : 0.00   Min.   : 1.000        Min.   :1.000  
 1st Qu.: 1.000   1st Qu.: 5.00   1st Qu.: 1.000        1st Qu.:1.000  
 Median : 4.000   Median :97.00   Median : 2.000        Median :1.000  
 Mean   : 6.753   Mean   :65.52   Mean   : 2.643        Mean   :2.416  
 3rd Qu.: 8.000   3rd Qu.:97.00   3rd Qu.: 3.000        3rd Qu.:2.000  
 Max.   :98.000   Max.   :98.00   Max.   :98.000        Max.   :8.000  
 NA's   :1536     NA's   :1536    NA's   :1536          NA's   :1536   
 PhysicalAbuseAttempts PhysicalInjure  SexualAssaults    PeopleMarried   
 Min.   : 0.000        Min.   : 0.00   Min.   : 0.0000   Min.   :0.0000  
 1st Qu.: 0.000        1st Qu.:97.00   1st Qu.: 0.0000   1st Qu.:0.0000  
 Median : 0.000        Median :97.00   Median : 0.0000   Median :0.0000  
 Mean   : 1.401        Mean   :84.38   Mean   : 0.7701   Mean   :0.5487  
 3rd Qu.: 0.000        3rd Qu.:97.00   3rd Qu.: 0.0000   3rd Qu.:1.0000  
 Max.   :98.000        Max.   :98.00   Max.   :98.0000   Max.   :8.0000  
 NA's   :1536          NA's   :1536    NA's   :1536      NA's   :1390    
   CigsSmoke       SmokeFreq     
 Min.   :  1.0   Min.   : 0.000  
 1st Qu.: 15.0   1st Qu.: 0.000  
 Median :997.0   Median : 0.000  
 Mean   :640.7   Mean   : 9.258  
 3rd Qu.:997.0   3rd Qu.:25.000  
 Max.   :998.0   Max.   :98.000  
 NA's   :1390    NA's   :1390    
NDF3 <- NDF %>%
  select(HappinessLevel, PhysicalAbuseAttempts, SexualAssaults)

Labeling Variables

In the next section of code, responses to the questions are labeled and levels of factors are given informative labels. The order of the levels is also rearranged for the variables TimeSpent, SatisfactionWConflict, HappinessLevel, PhysicalAbuseAttempts, PhysicalInjure, and SexualAssaults, the number of responses is pulled. Then, the factors are given readable names instead of being named with numbers.

The first variables examined are the variables related to the level of satisfactions level with the way individuals handle their problems and disagreements.

xtabs(~SatisfactionWConflict, data = NDF)
SatisfactionWConflict
   1    2    3    4    5   96   98 
1670 1743  711  587  233   11   13 
NDF$SatisfactionWConflict[NDF$SatisfactionWConflict==96 | NDF$SatisfactionWConflict==98] <- NA
NDF$SatisfactionWConflict <- factor(NDF$SatisfactionWConflict, labels = c("strongly agree", "agree", "neither agree nor disagree", "disagree","strongly disagree"))[, drop = TRUE]
xtabs(~SatisfactionWConflict, data = NDF)
SatisfactionWConflict
            strongly agree                      agree 
                      1670                       1743 
neither agree nor disagree                   disagree 
                       711                        587 
         strongly disagree 
                       233 

The second variables examined are the variables related to the level of happiness of individuals with their romantic relationship.

xtabs(~HappinessLevel, data = NDF)
HappinessLevel
   1    2    3    6    7    8 
2807  979  258   11  906    7 
NDF$HappinessLevel[NDF$HappinessLevel>=6] <- NA
NDF3$HappinessLevel[NDF3$HappinessLevel>=6] <- NA
NDF$HappinessLevel <- factor(NDF$HappinessLevel, labels = c("very happy", "fairly happy", "not too happy"))[, drop = TRUE]
xtabs(~HappinessLevel, data = NDF)
HappinessLevel
   very happy  fairly happy not too happy 
         2807           979           258 

The third variables examined are the variables related to the physical abuse threats and attempts toward the other partner in the romantic relationship.

xtabs(~PhysicalAbuseAttempts, data = NDF)
PhysicalAbuseAttempts
   0    1    2    3    4    5    6    7   96   98 
3880  294  306  186  148   48   25   37   28   16 
NDF$PhysicalAbuseAttempts[NDF$PhysicalAbuseAttempts>=9] <- NA
NDF3$PhysicalAbuseAttempts[NDF3$PhysicalAbuseAttempts>=9] <- NA

NDF$PhysicalAbuseAttempts <- factor(NDF$PhysicalAbuseAttempts, labels = c("never", "this has not happened in the past year, but it did happen before then ", "once in the last year of the relationship","twice in the last year of the relationship", "3 to 5 times in the last year of the relationship","6 to 10 times in the last year of the relationship", "11 to 20 times in the last year of the relationship", "more than 20 times in the last year of the relationship"))[, drop = TRUE]
xtabs(~PhysicalAbuseAttempts, data = NDF)
PhysicalAbuseAttempts
                                                                 never 
                                                                  3880 
this has not happened in the past year, but it did happen before then  
                                                                   294 
                             once in the last year of the relationship 
                                                                   306 
                            twice in the last year of the relationship 
                                                                   186 
                     3 to 5 times in the last year of the relationship 
                                                                   148 
                    6 to 10 times in the last year of the relationship 
                                                                    48 
                   11 to 20 times in the last year of the relationship 
                                                                    25 
               more than 20 times in the last year of the relationship 
                                                                    37 

The fourth variables examined are the variables related to the individuals’ physical abuse injure frequency in the romantic relationship.

xtabs(~PhysicalInjure, data = NDF)
PhysicalInjure
   0    1    2    3    4    5    6    7   96   97   98 
 395   74   86   32   35   17    2   12   17 4289    9 
NDF$PhysicalInjure[NDF$PhysicalInjure>=9] <- NA
NDF$PhysicalInjure <- factor(NDF$PhysicalInjure, labels = c("never", "this has not happened in the past year, but it did happen before then ", "once in the last year of the relationship","twice in the last year of the relationship", "3 to 5 times in the last year of the relationship","6 to 10 times in the last year of the relationship", "11 to 20 times in the last year of the relationship", "more than 20 times in the last year of the relationship"))[, drop = TRUE]
xtabs(~PhysicalInjure, data = NDF)
PhysicalInjure
                                                                 never 
                                                                   395 
this has not happened in the past year, but it did happen before then  
                                                                    74 
                             once in the last year of the relationship 
                                                                    86 
                            twice in the last year of the relationship 
                                                                    32 
                     3 to 5 times in the last year of the relationship 
                                                                    35 
                    6 to 10 times in the last year of the relationship 
                                                                    17 
                   11 to 20 times in the last year of the relationship 
                                                                     2 
               more than 20 times in the last year of the relationship 
                                                                    12 

The fifth variables examined are the variables related to the non-consensual sexual activities that the individuals were forced to engage in, in the domestic relationship.

xtabs(~SexualAssaults, data = NDF)
SexualAssaults
   0    1    2    3    4    5    6    7   96   98 
4622   58   88   70   56   21   10   13   19   11 
NDF$SexualAssaults[NDF$SexualAssaults>=9] <- NA
NDF3$SexualAssaults[NDF3$SexualAssaults>=9] <- NA
NDF$SexualAssaults <- factor(NDF$SexualAssaults, labels = c("never", "this has not happened in the past year, but it did happen before then ", "once in the last year of the relationship","twice in the last year of the relationship", "3 to 5 times in the last year of the relationship","6 to 10 times in the last year of the relationship", "11 to 20 times in the last year of the relationship", "more than 20 times in the last year of the relationship"))[, drop = TRUE]
xtabs(~SexualAssaults, data = NDF)
SexualAssaults
                                                                 never 
                                                                  4622 
this has not happened in the past year, but it did happen before then  
                                                                    58 
                             once in the last year of the relationship 
                                                                    88 
                            twice in the last year of the relationship 
                                                                    70 
                     3 to 5 times in the last year of the relationship 
                                                                    56 
                    6 to 10 times in the last year of the relationship 
                                                                    21 
                   11 to 20 times in the last year of the relationship 
                                                                    10 
               more than 20 times in the last year of the relationship 
                                                                    13 

The sixth variables examined are the variables related to the amount of time spent of the individual were involved in a romantic or sexual relationship with their partner.

xtabs(~TimeSpent, data = NDF)
TimeSpent
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
946 537 438 406 343 341 305 322 296 224 255 138 127  78  43  31  11   8 
 18  19  96  98 
  1   1  57  60 
NDF$TimeSpent[NDF$TimeSpent == 96] <- NA
NDF$TimeSpent[NDF$TimeSpent == 98] <- NA
NDF$TimeSpent <- factor(NDF$TimeSpent, labels = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19))[, drop = TRUE]
xtabs(~TimeSpent, data = NDF)
TimeSpent
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
946 537 438 406 343 341 305 322 296 224 255 138 127  78  43  31  11   8 
 18  19 
  1   1 

The seventh variables examined are the variables related to the amount of time spent of the individual were involved in a romantic or sexual relationship with their partner.

xtabs(~NightsSpent, data = NDF)
NightsSpent
   0    1    2    3    4    5    6    7   95   96   97   98 
 327  311  261  201  128  128   63  241   12   14 3273    9 
NDF$NightsSpent[NDF$NightsSpent>=8] <- NA
NDF$NightsSpent <- factor(NDF$NightsSpent, labels = c(0, 1, 2, 3, 4, 5, 6, 7))[, drop = TRUE]
xtabs(~NightsSpent, data = NDF)
NightsSpent
  0   1   2   3   4   5   6   7 
327 311 261 201 128 128  63 241 

The eighth variable examined is the variable related to the number of people the individual had ever married (including the current spouse if the individual is married at that current moment)

xtabs(~PeopleMarried, data = NDF)
PeopleMarried
   0    1    2    3    4    6    8 
2568 2331  197    9    1    7    1 
NDF$PeopleMarried[NDF$PeopleMarried>=5] <- NA
NDF$PeopleMarried <- factor(NDF$PeopleMarried, labels = c(0, 1, 2, 3, 4))[, drop = TRUE]
xtabs(~PeopleMarried, data = NDF)
PeopleMarried
   0    1    2    3    4 
2568 2331  197    9    1 

Variables

The barplots are all created with the package ggplot2. The barplots start with the defaults for the geom_bar and add more detail to the plot with each graph.

The first graph showed the variables related to the satisfactions with how individuals resolving conflicts or arguments in the relationship range from strongly disagree to strongly agree. And the graph is skewed right distribution. There is a big population that skipped out the question in comparison to other questions.

library(ggplot2)

ggplot(data = NDF, aes(x = SatisfactionWConflict, fill = SatisfactionWConflict)) + 
  geom_bar() + 
  labs(title = "I (am/was) satisfied with the way we handle our problems and disagreements", x = "Response to the way individuals \n handle their problems and
disagreements in a relationship") +
  theme_bw() +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5)) +
  guides(fill = guide_legend(title = "The  with the way we handle our problems and
disagreements"))

The second graph showed the variables related to the level of happiness of individuals with their romantic relationship including very happy, fairly happy, and not too happy. And the graph is skewed right distribution.

ggplot(data = NDF, aes(x = HappinessLevel, fill = HappinessLevel)) + 
  geom_bar() + 
  labs(title = " In general, how happy are you in your relationship with {initials}?", x = "Response about how happy \n are the individuals in their relationship?") +
  theme_bw() +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5)) + 
  guides(fill = guide_legend(title = "In general, how happy the individuals are in their relationship?"))

From the third graph to the fifth graph, there are 7 responses that the individuals can choose range from “this has not happened in the past year but it did happen before” to “more than 20 times in the past year in the relationship”. The third graph portrays the variables related to the physical abuse threats and attempts toward the other partner in the romantic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The graph is very similar to skewed right distribution.

ggplot(data = NDF, aes(x = PhysicalAbuseAttempts, fill = PhysicalAbuseAttempts)) + 
  geom_bar() + 
  labs(title = "How often (has/did) {initials} (threatened/threaten) you with violence, (pushed/push) \n or (shoved/shove) you, or (thrown/throw) something at you that could hurt? ", x = "Response to the frequency of physical abuse threats and attempts") +
  theme_bw() +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5, size = 13)) + 
  guides(fill = guide_legend(title = "The frequency of physical abuse threats and attempts", legend.key.size = 12))

The fourth graph portrays the variables related to the individuals’ physical abuse injure frequency in the romantic relationship. There was an overwhelming majority that legitimately skip or ignore the question which it is hard to make conjecture on the reason why they did it. Therefore, I omitted that population so the graph can be easily analyzed and representable to identify the behavior. The graph is very similar to skewed right distribution. The majority responded never but there was still considerable populations that experienced physical injures in their relationships.

ggplot(data = na.omit(NDF), aes(x = PhysicalInjure, fill = PhysicalInjure)) + 
  geom_bar() + 
  labs(title = "How often (have/did) you (had/have) an injury, such as \n a sprain, bruise, or cut because of a fight with {initials}? ", x = "Response to the frequeny of physical injures") +
  theme_bw() +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5, size = 13)) +
  guides(fill = guide_legend(title = "The frequency of physical injures"))

The fifth graph portrays the variables related to the non-consensual sexual activities that the individuals were forced to engage in, in the domestic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The graph is very similar to skewed right distribution. There was a more significant majority responded never to the non-consensual sexual activities that they engage in compare to the previous question of the frequency of physical injures.

ggplot(data = na.omit(NDF), aes(x = SexualAssaults, fill = SexualAssaults)) + 
  geom_bar() + 
  labs(title = " How often (has/did) {initials} (insisted/insist) on or (made/make) \n you have sexual relations with (him/her) when you didn't want to?", x = "Response to the frequeny of non-consensual sexual activities") +
  theme_bw() +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5, size = 13)) +
  guides(fill = guide_legend(title = "The frequency of non-consensual sexual activities"))

The sixth graph portrays the variables related to the amount of time that the individual spent together with their significant other in a sexual or romantic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The mean of the graph is around at 4.576 years and the median is 4 years. The graph distribution is skewed right.

NDF$TimeSpent <- gsub(",", "", NDF$TimeSpent)   # remove comma
NDF$TimeSpent <- as.numeric(NDF$TimeSpent)
hist(NDF$TimeSpent, xlab = "Time Spent Together(in years)", main= "The amount of time that the individual \n spent with their significant other", col= "light blue", breaks=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19))

summary(NDF$TimeSpent)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   1.000   4.000   4.576   7.500  19.000    1653 
fivenum(NDF$TimeSpent)
[1]  0.0  1.0  4.0  7.5 19.0

The seventh graph portrays the variables related to the amount of time that the individual spent together with their significant other in a sexual or romantic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The mean of the graph is around at 2.803 nights and the median is 2.0 years. The graph distribution is skewed right.

NDF$NightsSpent <- gsub(",", "", NDF$NightsSpent)   
NDF$NightsSpent <- as.numeric(NDF$NightsSpent)
hist(NDF$NightsSpent, xlab = "Nights Spent Together On Average Week", main= "The days on an average week that the individual \n spent with their significant other", col= "light blue", breaks=c(0,1,2,3,4,5,6,7))

summary(NDF$NightsSpent)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   1.000   2.000   2.803   5.000   7.000    4844 
fivenum(NDF$NightsSpent)
[1] 0 1 2 5 7

The eighth graph is a bivariate graph which shows the relationship as fraction of Physical Abuse Attempts Frequencies targeting individuals in the relationship by their Satisfaction Level with Conflict.

ggplot(data = NDF, aes(x = PhysicalAbuseAttempts, fill = SatisfactionWConflict)) + 
  geom_bar(position = "fill") +
  theme_bw() + 
  labs(x = "", y = "Fraction", 
       title = "Fraction of Physical Abuse Attempts Frequencies \notargeting individuals in the relationship \nby their Statisfaction Level with Conflict") + 
  scale_fill_manual(values = c("red", "green", "orange", "blue", "violet"), name = "Statisfaction Level with Conflict Status") + 
  guides(fill = guide_legend(reverse = TRUE)) + 
  theme(axis.text.x  = element_text(angle = 85, vjust = 0.5, size = 14))

The tenth graph is a bivariate graph which shows the relationship as fraction of Non-Consensual Sexual Activities or Sexual Assault Frequencies targeting individuals in the relationship by their Satisfaction Level with Conflict.

ggplot(data = NDF, aes(x = SexualAssaults, fill = SatisfactionWConflict)) + 
  geom_bar(position = "fill") +
  theme_bw() + 
  labs(x = "", y = "Fraction", 
       title = "Fraction of Sexual Assaults Frequencies \ntargeting individuals in the relationship \nby their Statisfaction Level with Conflict") + 
       scale_fill_manual(values = c("red", "green", "orange", "blue", "violet"), name = "Statisfaction Level with Conflict Status") + 
       guides(fill = guide_legend(reverse = TRUE)) + 
       theme(axis.text.x  = element_text(angle = 85, vjust = 0.5, size = 14))

The eleventh graph is a bivariate graph which shows the relationship as fraction of Happiness Level of individuals in the relationship by their Satisfaction Level with Conflict.

ggplot(data = NDF, aes(x = HappinessLevel, fill = SatisfactionWConflict)) + 
        geom_bar(position = "fill") +
  theme_bw() + 
  labs(x = "", y = "Fraction", 
       title = "Fraction of Happiness Level of individuals in the relationship \nby their Statisfaction Level with Conflict") + 
  scale_fill_manual(values = c("red", "orange", "green",  "blue", "violet"), name = "Happiness Level Status") + 
  guides(fill = guide_legend(reverse = TRUE)) + 
  theme(axis.text.x  = element_text(angle = 85, vjust = 0.5, size = 14))

The twelfth graph is a multivariate graph shows the relationship as fraction of frequency of physical abuse attempts that the individual experienced from their partner happiness level status by the frequency of being the victim of sexual assaults.

ggplot(data = NDF, aes(x = SexualAssaults, fill = HappinessLevel)) + 
  geom_bar(position = "fill") +
  theme_bw() + 
  labs(x = "", y = "Fraction", 
       title = "Fraction of the frequency of physical abuse attempts that the individual experienced\n from their partner happiness level status \nby the frequency of being the victim of sexual assaults") + 
  scale_fill_manual(values = c("red", "yellow", "blue"), name = "Happiness Level Status") + 
  guides(fill = guide_legend(reverse = TRUE)) + 
  facet_grid(PhysicalAbuseAttempts ~ .) + 
  theme(axis.text.x  = element_text(angle = 85, vjust = 0.5, size = 14))

ggplot(data = NDF, aes(x = PhysicalAbuseAttempts, fill = HappinessLevel)) + 
  geom_bar(position = "fill") +
  theme_bw() + 
  labs(x = "", y = "Fraction", 
       title = "Fraction of Physical Abuse Frequencies \n targeting individuals in the relationship \nby their Happiness Level") + 
  scale_fill_manual(values = c("red", "yellow", "blue"), name = "Happiness Level Status:") + 
  guides(fill = guide_legend(reverse = TRUE)) + 
  theme(axis.text.x  = element_text(angle = 70, vjust = 0.5, size = 14))

ggplot(data = NDF, aes(x = SexualAssaults, fill = HappinessLevel)) + 
  geom_bar(position = "fill") +
  theme_bw() + 
  labs(x = "", y = "Fraction", 
       title = "Fraction of Sexual Assaults Frequencies \n targeting individuals in the relationship \nby their Happiness Level") + 
  scale_fill_manual(values = c("red", "yellow", "blue"), name = "Happiness Level Status:") + 
  guides(fill = guide_legend(reverse = TRUE)) + 
  theme(axis.text.x  = element_text(angle = 70, vjust = 0.5, size = 14))

ANOVA

NDF2 <- NDF %>%
  select(TimeSpent, PhysicalAbuseAttempts)
NDF2 = na.omit(NDF2)
NDF11 <- NDF %>%
  select(TimeSpent, SexualAssaults)
NDF11 = na.omit(NDF11)
summary(NDF2)
   TimeSpent    
 Min.   : 0.00  
 1st Qu.: 1.00  
 Median : 4.00  
 Mean   : 4.58  
 3rd Qu.: 8.00  
 Max.   :19.00  
                
                                                            PhysicalAbuseAttempts
 never                                                                 :3805     
 once in the last year of the relationship                             : 298     
 this has not happened in the past year, but it did happen before then : 294     
 twice in the last year of the relationship                            : 180     
 3 to 5 times in the last year of the relationship                     : 139     
 6 to 10 times in the last year of the relationship                    :  48     
 (Other)                                                               :  61     
tapply(NDF2$TimeSpent, NDF2$PhysicalAbuseAttempts, mean)
                                                                 never 
                                                              4.396321 
this has not happened in the past year, but it did happen before then  
                                                              6.619048 
                             once in the last year of the relationship 
                                                              4.657718 
                            twice in the last year of the relationship 
                                                              5.200000 
                     3 to 5 times in the last year of the relationship 
                                                              4.410072 
                    6 to 10 times in the last year of the relationship 
                                                              4.145833 
                   11 to 20 times in the last year of the relationship 
                                                              4.240000 
               more than 20 times in the last year of the relationship 
                                                              5.111111 
tapply(NDF2$TimeSpent, NDF2$PhysicalAbuseAttempts, sd)
                                                                 never 
                                                              4.010775 
this has not happened in the past year, but it did happen before then  
                                                              3.473613 
                             once in the last year of the relationship 
                                                              3.941956 
                            twice in the last year of the relationship 
                                                              3.959854 
                     3 to 5 times in the last year of the relationship 
                                                              3.698439 
                    6 to 10 times in the last year of the relationship 
                                                              3.326167 
                   11 to 20 times in the last year of the relationship 
                                                              3.072458 
               more than 20 times in the last year of the relationship 
                                                              3.955306 
PAA <- rep(NA, length(NDF2$PhysicalAbuseAttempts))

This graph is a boxplot graph shows the relationship of physical abuse attempts versus time spent together in years. There are some outliers but beside that the means and medians of variables are very similar to each other.The distribution is slightly skewed right as shown in the graph below:

library(ggplot2)
ggplot(data = NDF2, aes(x = PhysicalAbuseAttempts, y = TimeSpent, fill = PhysicalAbuseAttempts)) +
  geom_boxplot() +
  theme_bw() + 
  guides(fill = FALSE) + 
  labs(x = "Physical Abuse Frequency", y = "Time Spent Together(in years)", title = "Physical Abuse Frequency By Time Spent Together") +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5, size = 14))

library(ggplot2)
ggplot(data = NDF11, aes(x = SexualAssaults, y = TimeSpent, fill = SexualAssaults)) +
  geom_boxplot() +
  theme_bw() + 
  guides(fill = FALSE) + 
  labs(x = "Sexual Assaults Frequency", y = "Time Spent Together(in years)", title = "Sexual Assaults By Time Spent Together") +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5, size = 14))

This graph is a violin graph shows the relationship of physical abuse attempts versus time spent together in years. The means and medians of variables are very similar to each other. The distribution is slightly skewed right as shown in the graph below:

ggplot(data = NDF2, aes(x = PhysicalAbuseAttempts, y = TimeSpent, fill = PhysicalAbuseAttempts)) +
  geom_violin() +
  theme_bw() + 
  guides(fill = FALSE) + 
  labs(x = "Satisfaction With Conflict", y = "Time Spent Together(in years)", title = "Satisfaction By Conflict Time Spent Together") +
  theme(axis.text.x  = element_text(angle = 75, vjust = 0.5, size = 14))

\(H_o\): \(\pi_{TS}\) = \(\pi_{PAA}\)

\(H_a\): \(\pi_{TS}\) \(\neq\) \(\pi_{PAA}\)

\(H_o\): There is no association between time spent and physical abuse attempts frequency in the relationship.
\(H_a\): There is no association between time spent and physical abuse attempts frequency in the relationship

PAA[NDF2$PhysicalAbuseAttempts == "never"] <- "0"
PAA[NDF2$PhysicalAbuseAttempts == "this has not happened in the past year, but it did happen before then "] <- "1"
PAA[NDF2$PhysicalAbuseAttempts == "once in the last year of the relationship"] <- "2"
PAA[NDF2$PhysicalAbuseAttempts == "twice in the last year of the relationship"] <- "3"
PAA[NDF2$PhysicalAbuseAttempts == "3 to 5 times in the last year of the relationship"] <- "4"
PAA[NDF2$PhysicalAbuseAttempts == "6 to 10 times in the last year of the relationship"] <- "5"
PAA[NDF2$PhysicalAbuseAttempts == "11 to 20 times in the last year of the relationship"] <- "6"
PAA[NDF2$PhysicalAbuseAttempts == "more than 20 times in the last year of the relationship"] <- "7"

summary(PAA)
   Length     Class      Mode 
     4825 character character 
DF <- data.frame(NDF2$TimeSpent, PAA)
DF <- DF %>%
  rename(TimeSpent = NDF2.TimeSpent,PhysicalAbuseAttempts = PAA)
summary(DF)
   TimeSpent     PhysicalAbuseAttempts
 Min.   : 0.00   0      :3805         
 1st Qu.: 1.00   2      : 298         
 Median : 4.00   1      : 294         
 Mean   : 4.58   3      : 180         
 3rd Qu.: 8.00   4      : 139         
 Max.   :19.00   5      :  48         
                 (Other):  61         
mod1 <- aov(TimeSpent ~ PhysicalAbuseAttempts, data = DF)
mod1
Call:
   aov(formula = TimeSpent ~ PhysicalAbuseAttempts, data = DF)

Terms:
                PhysicalAbuseAttempts Residuals
Sum of Squares                1447.84  75331.29
Deg. of Freedom                     7      4817

Residual standard error: 3.954571
Estimated effects may be unbalanced
summary(mod1)
                        Df Sum Sq Mean Sq F value Pr(>F)    
PhysicalAbuseAttempts    7   1448  206.83   13.23 <2e-16 ***
Residuals             4817  75331   15.64                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pval <- summary(mod1)[[1]][["Pr(>F)"]][[1]]
Pval
[1] 5.406079e-17

Since \(5.4060789\times 10^{-17}\) < 0.01, the null hypothesis is rejected. It can be concluded that there is a very strongly significant association between The Time Spent In Years between the individual with their partners and Physical Abuse Attempts Frequency.

by(DF$TimeSpent, DF$PhysicalAbuseAttempts, mean, na.rm = TRUE)
DF$PhysicalAbuseAttempts: 0
[1] 4.396321
-------------------------------------------------------- 
DF$PhysicalAbuseAttempts: 1
[1] 6.619048
-------------------------------------------------------- 
DF$PhysicalAbuseAttempts: 2
[1] 4.657718
-------------------------------------------------------- 
DF$PhysicalAbuseAttempts: 3
[1] 5.2
-------------------------------------------------------- 
DF$PhysicalAbuseAttempts: 4
[1] 4.410072
-------------------------------------------------------- 
DF$PhysicalAbuseAttempts: 5
[1] 4.145833
-------------------------------------------------------- 
DF$PhysicalAbuseAttempts: 6
[1] 4.24
-------------------------------------------------------- 
DF$PhysicalAbuseAttempts: 7
[1] 5.111111
TukeyHSD(mod1)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = TimeSpent ~ PhysicalAbuseAttempts, data = DF)

$PhysicalAbuseAttempts
           diff        lwr        upr     p adj
1-0  2.22272699  1.4968754  2.9485785 0.0000000
2-0  0.26139749 -0.4599178  0.9827128 0.9572290
3-0  0.80367937 -0.1109822  1.7183409 0.1338936
4-0  0.01375131 -1.0217331  1.0492357 1.0000000
5-0 -0.25048730 -1.9921393  1.4911648 0.9998653
6-0 -0.15632063 -2.5624108  2.2497695 0.9999994
7-0  0.71479048 -1.2931619  2.7227429 0.9611603
2-1 -1.96132950 -2.9470162 -0.9756428 0.0000001
3-1 -1.41904762 -2.5538993 -0.2841960 0.0037807
4-1 -2.20897568 -3.4432813 -0.9746700 0.0000017
5-1 -2.47321429 -4.3399318 -0.6064967 0.0015392
6-1 -2.37904762 -4.8771574  0.1190621 0.0752629
7-1 -1.50793651 -3.6252828  0.6094098 0.3768112
3-2  0.54228188 -0.5896738  1.6742375 0.8323596
4-2 -0.24764618 -1.4792897  0.9839973 0.9987651
5-2 -0.51188479 -2.3768431  1.3530736 0.9913003
6-2 -0.41771812 -2.9145136  2.0790773 0.9996300
7-2  0.45339299 -1.6624025  2.5691885 0.9981396
4-3 -0.78992806 -2.1439058  0.5640497 0.6412543
5-3 -1.05416667 -3.0020834  0.8937501 0.7252662
6-3 -0.96000000 -3.5193549  1.5993549 0.9486170
7-3 -0.08888889 -2.2781583  2.1003805 1.0000000
5-4 -0.26423861 -2.2717251  1.7432478 0.9999259
6-4 -0.17007194 -2.7750517  2.4349079 0.9999994
7-4  0.70103917 -1.5413976  2.9434760 0.9812392
6-5  0.09416667 -2.8633735  3.0517068 1.0000000
7-5  0.96527778 -1.6785162  3.6090718 0.9554778
7-6  0.87111111 -2.2506776  3.9928998 0.9903917

There are five statistically significant results including:

Chi-Square Test

NDF4 <- data.frame(NDF3$PhysicalAbuseAttempts, NDF3$SexualAssaults, NDF3$HappinessLevel)
NDF4 <- NDF4 %>%
  rename(PhysicalAbuseAttempts = NDF3.PhysicalAbuseAttempts, SexualAssaults = NDF3.SexualAssaults, HappinessLevel = NDF3.HappinessLevel)
summary(DF)
   TimeSpent     PhysicalAbuseAttempts
 Min.   : 0.00   0      :3805         
 1st Qu.: 1.00   2      : 298         
 Median : 4.00   1      : 294         
 Mean   : 4.58   3      : 180         
 3rd Qu.: 8.00   4      : 139         
 Max.   :19.00   5      :  48         
                 (Other):  61         
NDF3$PhysicalAbuseAttempts <- factor(ifelse(NDF3$PhysicalAbuseAttempts %in% c("1", "2", "3", "4", "5", "6", "7"), "Yes", "No"))
NDF3$SexualAssaults <- factor(ifelse(NDF3$SexualAssaults %in% c("1", "2", "3", "4", "5", "6", "7"), "Yes", "No"))

NDF3$HappinessLevel <- factor(ifelse(NDF3$HappinessLevel %in% c("2", "3"), "No", "Yes"))
NDF4$HappinessLevel <- factor(ifelse(NDF4$HappinessLevel %in% c("2", "3"), "No", "Yes"))
T1 <- xtabs(~HappinessLevel + PhysicalAbuseAttempts, data = NDF3)
T1
              PhysicalAbuseAttempts
HappinessLevel   No  Yes
           No   809  428
           Yes 4651  616
T2 <- xtabs(~HappinessLevel + SexualAssaults, data = NDF3)
T2
              SexualAssaults
HappinessLevel   No  Yes
           No  1106  131
           Yes 5082  185

\(H_o\): There is no relationship between happiness level and physical abuse attempts frequency in the relationship.
\(H_a\): There is a relationship between happiness level and physical abuse attempts frequency in the relationship

If Happiness Level(HL) and Physical Abuse Attempts(PAA) are independent, then P(HL & PAA)=P(HL)×P(PAA). We use this rule for calculating expected counts, one cell at a time in T1:

\(Expected Count=\frac{Column Total×Row Total}{Table Total}\)

chisq.test(T1)$expected
              PhysicalAbuseAttempts
HappinessLevel       No     Yes
           No  1038.441 198.559
           Yes 4421.559 845.441
tab <- prop.table(T1, 1)
tab
              PhysicalAbuseAttempts
HappinessLevel        No       Yes
           No  0.6540016 0.3459984
           Yes 0.8830454 0.1169546

When examining the association between happiness level (categorical response) and physical abuse attempts frequency in the relationship (categorical explanatory), looking up at the table revealed among daily, young adults smokers (my sample), those were victims of physical abuses were more likely to have experienced being unhappy (34.5998383%) compared to those experiences of being victimed of physical abuses (11.6954623%).

We then test my observation through the Chi-Square Test.

ChiS1 <- chisq.test(T1, correct = FALSE)
ChiS1

    Pearson's Chi-squared test

data:  T1
X-squared = 389.99, df = 1, p-value < 2.2e-16
ChiS11 <- chisq.test(T2, correct = FALSE)
ChiS11

    Pearson's Chi-squared test

data:  T2
X-squared = 108.56, df = 1, p-value < 2.2e-16

From the Chi-Square test, we find that the p-value for these tests are \(8.3042086\times 10^{-87}\) and \(2.0285769\times 10^{-25}\) which are extrememly small and definitely smaller than 0.01 which indicate that it is statistically significant. Therefore, we have a very strong evidence and we can reject the null hypothesis and accept the alternative hypothesis that there is a relationship between happiness level and physical abuse attempts frequency in the relationship(\(\chi ^2 = 389.9934728, df = 1, p = 8.3042086\times 10^{-87} < 0.01\)) and there is a relationship between happiness level and sexual assaults frequency in the relationship(\(\chi ^2 = 108.5577038, df = 1, p = 2.0285769\times 10^{-25} < 0.01\)).

Post-hoc Test

T2 <- xtabs(~HappinessLevel + PhysicalAbuseAttempts, data = NDF4)
T2
              PhysicalAbuseAttempts
HappinessLevel    0    1    2    3    4    5    6    7
           No   803  108  113   85   68   23   11   20
           Yes 3077  186  193  101   80   25   14   17
prop.table(T2, 1)
              PhysicalAbuseAttempts
HappinessLevel           0           1           2           3           4
           No  0.652315191 0.087733550 0.091795288 0.069049553 0.055239643
           Yes 0.833197942 0.050365556 0.052261034 0.027349039 0.021662605
              PhysicalAbuseAttempts
HappinessLevel           5           6           7
           No  0.018683997 0.008935825 0.016246954
           Yes 0.006769564 0.003790956 0.004603304
ChiS2 <- chisq.test(T2, correct = FALSE)
ChiS2

    Pearson's Chi-squared test

data:  T2
X-squared = 195.2, df = 7, p-value < 2.2e-16

A Chi Square test of independence revealed that among individuals that are in their relationships, happiness level and physical abuse attempts frequency in the relationship were significantly associated, \(\chi ^2 = 195.1951337, df = 7, p = 1.1942317\times 10^{-38} < 0.01\).

n <- choose(ncol(T2), nrow(T2))
limit <- 0.05/n
limit
[1] 0.001785714

Now, there will be 28 post hoc tests comparing between columns of variable the physical abuse attempts frequency. The standard significant for gauging the smallness of a p-value is 0.05 but there are 28 individual test inside that big one so we have to divide it by 28. Therefore, the result of one of these test would be significant if the post hoc test has a p-value that is less than 0.0017857.

chisq.test(T2[, c(1, 2)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(1, 2)]
X-squared = 41.204, df = 1, p-value = 1.371e-10
chisq.test(T2[, c(1, 3)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(1, 3)]
X-squared = 43.719, df = 1, p-value = 3.792e-11
chisq.test(T2[, c(1, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(1, 4)]
X-squared = 65.003, df = 1, p-value = 7.48e-16
chisq.test(T2[, c(1, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(1, 5)]
X-squared = 53.631, df = 1, p-value = 2.419e-13
chisq.test(T2[, c(1, 6)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(1, 6)]
X-squared = 21.156, df = 1, p-value = 4.235e-06
chisq.test(T2[, c(1, 7)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(1, 7)]
X-squared = 8.1759, df = 1, p-value = 0.004245
chisq.test(T2[, c(1, 8)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(1, 8)]
X-squared = 24.574, df = 1, p-value = 7.152e-07
chisq.test(T2[, c(2, 3)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(2, 3)]
X-squared = 0.0024107, df = 1, p-value = 0.9608
chisq.test(T2[, c(2, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(2, 4)]
X-squared = 3.8079, df = 1, p-value = 0.05101
chisq.test(T2[, c(2, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(2, 5)]
X-squared = 3.4856, df = 1, p-value = 0.06191
chisq.test(T2[, c(2, 6)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(2, 6)]
X-squared = 2.1832, df = 1, p-value = 0.1395
chisq.test(T2[, c(2, 7)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(2, 7)]
X-squared = 0.52001, df = 1, p-value = 0.4708
chisq.test(T2[, c(2, 8)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(2, 8)]
X-squared = 4.1566, df = 1, p-value = 0.04147
chisq.test(T2[, c(3, 4)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(3, 4)]
X-squared = 3.7006, df = 1, p-value = 0.05439
chisq.test(T2[, c(3, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(3, 5)]
X-squared = 3.3838, df = 1, p-value = 0.06584
chisq.test(T2[, c(3, 6)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(3, 6)]
X-squared = 2.1176, df = 1, p-value = 0.1456
chisq.test(T2[, c(3, 7)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(3, 7)]
X-squared = 0.49337, df = 1, p-value = 0.4824
chisq.test(T2[, c(3, 8)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(3, 8)]
X-squared = 4.0781, df = 1, p-value = 0.04344
chisq.test(T2[, c(4, 5)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(4, 5)]
X-squared = 0.0020259, df = 1, p-value = 0.9641
chisq.test(T2[, c(4, 6)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(4, 6)]
X-squared = 0.075509, df = 1, p-value = 0.7835
chisq.test(T2[, c(4, 7)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(4, 7)]
X-squared = 0.025652, df = 1, p-value = 0.8728
chisq.test(T2[, c(4, 8)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(4, 8)]
X-squared = 0.86468, df = 1, p-value = 0.3524
chisq.test(T2[, c(5, 6)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(5, 6)]
X-squared = 0.056595, df = 1, p-value = 0.812
chisq.test(T2[, c(5, 7)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(5, 7)]
X-squared = 0.03264, df = 1, p-value = 0.8566
chisq.test(T2[, c(5, 8)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(5, 8)]
X-squared = 0.78022, df = 1, p-value = 0.3771
chisq.test(T2[, c(6, 7)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(6, 7)]
X-squared = 0.10134, df = 1, p-value = 0.7502
chisq.test(T2[, c(6, 8)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(6, 8)]
X-squared = 0.31486, df = 1, p-value = 0.5747
chisq.test(T2[, c(7, 8)], correct = FALSE)

    Pearson's Chi-squared test

data:  T2[, c(7, 8)]
X-squared = 0.60324, df = 1, p-value = 0.4373

Post hoc comparisons of happiness level and physical abuse attempts frequency categories revealed that higher rates of unhappiness were seen among those being victimed of more physical abuse attempts frequency, up to 11 to 20 times in the last year of the relationship. After the above process, it was found that there were 7 significant results including:

-“never” and “this has not happened in the past year, but it did happen before then” (Column 0 and 1)

-“never” and “once in the last year of the relationship” (Column 0 and 2),

-“never” and “twice in the last year of the relationship” (Column 0 and 3)

-“never” and “3 to 5 times in the last year of the relationship” (Column 0 and 4)

-“never” and “6 to 10 times in the last year of the relationship” (Column 0 and 5)

-“never” and “more than 20 times in the last year of the relationship” (Column 0 and 7)

Correlation Coefficient \((r)\) \(Q→Q\)

NDF$PeopleMarried <- gsub(",", "", NDF$PeopleMarried)
NDF$PeopleMarried <- as.numeric(NDF$PeopleMarried)

ggplot(data = NDF, aes(x = PeopleMarried, y = TimeSpent)) + 
  geom_point() +
  theme_bw() +
  geom_smooth(method = "lm") + 
  labs(x = "People that the individual subject married in the past", y = "Time Spent Together(in years)", title = "Correlation Graph") 

My codebook has significantly more categorical variables than quantitative variables. The ones I found that are relavant to my research only produced 58 data points generated as a result which indicates a weak relationship. However, the scatterplot did suggest a relationship that is positive and close to a linear form.

cor.test(NDF$PeopleMarried, NDF$TimeSpent, use = "complete.obs")

    Pearson's product-moment correlation

data:  NDF$PeopleMarried and NDF$TimeSpent
t = 29.845, df = 4847, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3699609 0.4175227
sample estimates:
      cor 
0.3940055 
r <- cor(NDF$PeopleMarried, NDF$TimeSpent, use = "complete.obs")
r
[1] 0.3940055
variance <- r^2
variance
[1] 0.1552404

Among individuals (my sample), the correlation between number of people that they married in the past (quantitative) and number of time spent together (in years) (quantitative) was 0.3940055 (p < 0.0001), suggesting that only 15.5240357% (i.e. 0.3940055 squared) of the variance in number of time spent together can be explained by number of the number of people that they married in the past.

As mentioned above, the scatterplot suggests a relationship that is positive. However, the value of the correlation that we find between the two variables is r= 0.3940055, which is closer to 0 than 1, which indicates a weak linear relationship between the two variables.

Linear Regression

First, the data is placed on the search path using the alr4 package. The variables of interest are selected and stored in the data frame LRDF using the select function from the dplyr package. Then, those variables of interest including Velocity and Dist varibles. A data frame with 96 observations was used to investigate the relationship between the velocity of the baseball and the distance that the baseball travels.

library(alr4)
summary(domedata)
   Cond       Velocity         Angle           BallWt         BallDia     
 Head:19   Min.   :149.3   Min.   :48.30   Min.   :140.1   Min.   :2.810  
 Tail:15   1st Qu.:154.1   1st Qu.:49.50   1st Qu.:140.1   1st Qu.:2.810  
           Median :155.5   Median :50.00   Median :141.0   Median :2.860  
           Mean   :155.2   Mean   :49.98   Mean   :140.7   Mean   :2.842  
           3rd Qu.:156.3   3rd Qu.:50.60   3rd Qu.:141.0   3rd Qu.:2.860  
           Max.   :160.9   Max.   :51.00   Max.   :141.9   Max.   :2.880  
      Dist      
 Min.   :329.3  
 1st Qu.:347.8  
 Median :351.9  
 Mean   :353.4  
 3rd Qu.:359.2  
 Max.   :373.8  
LRDF <- domedata %>%
  select(Velocity, Dist)
cor.test(LRDF$Velocity, LRDF$Dist, use = "complete.obs")

    Pearson's product-moment correlation

data:  LRDF$Velocity and LRDF$Dist
t = 4.0574, df = 32, p-value = 0.000298
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3047263 0.7693615
sample estimates:
      cor 
0.5828323 
r2 <- cor(LRDF$Velocity, LRDF$Dist, use = "complete.obs")

The correlation 0.5828323 is pretty strong > 0.5, however, doesn’t fully characterize the linear relationship between two quantitative variables—it only measures the strength and direction (positive and linear). Therefore, we will need to summarize the linear relationship through trying to fit a line that best fits the linear pattern of the data between the Velocity and Dist variables.

library(ggplot2)
ggplot(data = LRDF, aes(x = Velocity, y = Dist)) + 
    geom_point(color = "purple") +
    theme_bw() + 
    labs(x = "Velocity of the Baseball(feet/second)", y = "Distance Traveling(feet)") 

The above graph is a scatter plot of all the data of two variables in the LRDF data frame including Velocity of the Baseball (feet/second) and Distance Travelling of the baseball (feet). One can see a positive correlation with a linear form of a relaitonship between these two variables from the figure and the result of the correlation test above. We will investigate the relationship these twos more in-depth.

mod.lm <- lm(Dist~Velocity, data = LRDF)
summary(mod.lm)

Call:
lm(formula = Dist ~ Velocity, data = LRDF)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.2292  -6.8905   0.1209   6.7111  12.9798 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -13.4767    90.4218  -0.149 0.882456    
Velocity      2.3638     0.5826   4.057 0.000298 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.095 on 32 degrees of freedom
Multiple R-squared:  0.3397,    Adjusted R-squared:  0.3191 
F-statistic: 16.46 on 1 and 32 DF,  p-value: 0.000298
plot(mod.lm, which = 1)

It is evident that the residual graph does not show any pattern. I am confident to be able to summarize the linear relationship by trying to fit a line that best fits the linear pattern of the data.

MEAN <- apply(LRDF, 2, mean)
SD <- apply(LRDF, 2, sd)
MEAN
Velocity     Dist 
155.1882 353.3559 
SD
Velocity     Dist 
2.418710 9.809556 
coef(summary(mod.lm))
              Estimate Std. Error    t value     Pr(>|t|)
(Intercept) -13.476660 90.4218220 -0.1490421 0.8824557262
Velocity      2.363791  0.5825903  4.0573810 0.0002980013
b <- coef(summary(mod.lm))[2, 1]

The slope of the fitted line is:

\[b = r*\frac{\Delta Dist}{\Delta Velocity}\]

\[<=>b = 0.5828323 * \frac{9.809556}{2.4187105} = 2.3637909\]

This means that for every 1-unit increase of the explanatory variable, there is, on average, a 2.3637909-unit increase in the response variable. Specifically, for every feet per second that the baseball travels faster, the maximum distance by a baseball increases, on average, by 2.3637909 feet.

The intercept of the line is: \(a = 353.3558824 - 2.3637909*155.1882353 = -13.4766605\)

And therefore the least squares regression line for this case is:

\(\hat{Dist} = 2.3637909*Velocity -13.4766605\)

ggplot(data = LRDF, aes(x = Velocity, y = Dist)) + 
    geom_point(color = "purple") +
    theme_bw() + 
    labs(x = "Velocity of the Baseball(feet/second", y = "Distance Traveling(feet)") + 
    geom_smooth(method = "lm", se = TRUE) +
    labs(title = expression(hat(Y) == "2.3637909x − 13.4766605") )

The figure above is the regression line trying to fit the linear pattern of the data of the scatter plot. Evidently, it fits the linear pattern of the data quite well.

PV <- predict(mod.lm, newdata = data.frame(Velocity = 159))
PV
       1 
362.3661 

Practically, what the figure tells us is that in order to find the predicted maximum distance for a 159(feet/second), we plug Velocity = 159 into the regression line equation, to find that:

Predicted distance = (2.3637909 * 159) - 13.4766605= 362.3660972. 362.3660972 feet is our best prediction for the maximum distance at which the velocity is 159.